Towards a complete map of the protein space based on a uni ed sequence and structure analysis of all known proteins
نویسندگان
چکیده
In search for global principles that may explain the organization of the space of all possible proteins, we study all known protein sequences and structures. In this paper we present a global map of the protein space based on our analysis. Our protein space contains all protein sequences in a non-redundant (NR) database, which includes all major sequence databases. Using the PSI-BLAST procedure we de ned 4670 clusters of related sequences in this space. Of these clusters, 1421 are centered on a sequence of known structure. All 4670 clusters were then compared using either a structure metric (when 3D structures are known) or a novel sequence pro le metric. These scores were used to dene a uni ed and consistent metric between all clusters. Two schemes were employed to organize these clusters in a meta-organization. The rst uses a graph theory method and cluster the clusters in an hierarchical organization. This organization extends our ability to predict the structure and function of many proteins beyond what is possible with existing tools for sequence analysis. The second uses a variation on a multidimensional scaling technique to embed the clusters in a low dimensional real space. This last approach resulted in a projection of the protein space onto a 2D plane that provides us with a bird's eye view of the protein space. Based on this map we suggest a list of possible target sequences with unknown structure that are likely to adopt new, unknown folds.
منابع مشابه
Computational Identification of Micro RNAs and Their Transcript Target(s) in Field Mustard (Brassica rapa L.)
Background: Micro RNAs (miRNAs) are a pivotal part of non-protein-coding endogenous small RNA molecules that regulate the genes involved in plant growth and development, and respond to biotic and abiotic environmental stresses posttranscriptionally.Objective: In the present study, we report the results of a systemic search for identifi cation of new miRNAs in B. rapa using homology-based ...
متن کاملTowards a Complete Map of the Protein Space Based on a Unified Sequence and Structure Analysis of All Known Proteins
In search for global principles that may explain the organization of the space of all possible proteins, we study all known protein sequences and structures. In this paper we present a global map of the protein space based on our analysis. Our protein space contains all protein sequences in a non-redundant (NR) database, which includes all major sequence databases. Using the PSI-BLAST procedure...
متن کاملComparative bioinformatics analysis of a wild diploid Gossypium with two cultivated allotetraploid species
Background: Gossypium thurberi is a wild diploid species that has been used to improve cultivated allotetraploid cotton. G. thurberi belongs to D genome, which is an important wild bio-source for the cotton breeding and genetic research. To a certain degree, chloroplast DNA sequence information are a versatile tool for species identification and phylogenetic implications in plants. Different ch...
متن کاملPrediction of 3D protein Structure based on Mutation of AKAP3 and PLOD3 Gene in Case of Non-Obstructive Azoospermia
Background: The present study has been designed with the aim of evaluating A-kinase anchoring proteins 3 (AKAP3)and Procollagen-Lysine, 2-Oxoglutarate 5-Dioxygenase 3 (PLOD3) gene mutations and prediction of 3D proteinstructure for ligand binding activity in the cases of non-obstructive azoospermic male.Materials and Methods: Clinically diagnosed cases of non-obstructive azoos...
متن کاملGenetic Structure of SSR1 & SSR2 loci from Iranian Mycobacterium Avium Subspecies Paratuberculosis Isolates by a Short Sequence Repeat Analysis Approach
Abstract Background and Objective: Paratuberculosis has been repeatedly reported from Iranian ruminant herds. The extrem fastidious nature of Mycobacterium avium subspecies paratuberculsos hinders genomic diversity studies of the pathogen. Short Sequence Repeat analysis is one of the genome-based approches recently developed to overcome this d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001